feat: content blocking detection for headless browsers#73
Merged
Conversation
Add contentSelectors and contentBlockedIndicators fields to the X (Twitter) provider entry. These define the DOM selectors and text patterns used to detect when X.com blocks headless browsers from viewing feed content. Updated notes to document blocking behavior.
Add a new detectContentBlocked() function that detects when sites serve pages but block actual content from headless browsers. Uses five ordered heuristics (OR logic): provider blocked selectors, provider blocked text patterns, empty content areas, generic error text with short body, and persistent loading indicators. Exports CONTENT_BLOCKED_TEXT_PATTERNS.
Expand the addInitScript block with additional stealth measures: - Spoof window.chrome object (present in real Chrome, missing in headless) - Spoof navigator.plugins with non-empty PluginArray-like object - Set navigator.languages to ['en-US', 'en'] - Override WebGL vendor/renderer to Intel Inc. / Intel Iris OpenGL Engine - Override permissions.query for 'notifications' to return denied state
Import detectContentBlocked and add matchProviderByDomain helper with lazy-loaded Map for O(1) provider lookup. After goto navigation and waitForLoaded, detect content blocking using provider-specific config from providers.json. When detected, add contentBlocked, warning, reason, and suggestion fields to the result. Add --no-content-block-detect flag to skip detection.
Add 19 tests covering all detection heuristics: - Provider-specific blocked selectors and text patterns - Empty content area detection with threshold - Generic error text with short body - Persistent loading indicators (visible vs invisible) - Error handling for page.$() and textContent() failures - Default emptyContentThreshold of 200 - X.com-specific: empty feed, error state, no false positives - CONTENT_BLOCKED_TEXT_PATTERNS export validation
…xports Add two tests to the existing auth-wall-detect test suite confirming that the new detectContentBlocked function and CONTENT_BLOCKED_TEXT_PATTERNS constant are properly exported from the module.
…e case tests - Cache bodyText fetch in detectContentBlocked to avoid redundant DOM query - Export LOADING_INDICATOR_SELECTORS for testability - Add empty contentSelectors array edge case test - Add LOADING_INDICATOR_SELECTORS validation tests
This was referenced Feb 26, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
detectContentBlocked()function to detect when sites serve pages but block content from headless browsers (e.g., X.com empty timelines)gotoaction with--no-content-block-detectflagTest Plan
Related Issues
Closes #38